We investigate data-driven texture modeling via analysis and synthesis with generative adversarial networks. For network training and testing, we have compiled a diverse set of spatially homogeneous textures, ranging from stochastic to regular. We adopt StyleGAN3 for synthesis and demonstrate that it produces diverse textures beyond those represented in the training data. For texture analysis, we propose GAN inversion using a novel latent domain reconstruction consistency criterion for synthesized textures, and iterative refinement with Gramian loss for real textures. We propose perceptual procedures for evaluating network capabilities, exploring the global and local behavior of latent space trajectories, and comparing with existing texture analysis-synthesis techniques.
translated by 谷歌翻译
Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.
translated by 谷歌翻译
This paper presents the development of an AI-based language learning platform Revita. It is a freely available intelligent online tutor, developed to support learners of multiple languages, from low-intermediate to advanced levels. It has been in pilot use by hundreds of students at several universities, whose feedback and needs are shaping the development. One of the main emerging features of Revita is the introduction of a system of linguistic constructs as the representation of domain knowledge. The system of constructs is developed in close collaboration with experts in language teaching. Constructs define the types of exercises, the content of the feedback, and enable the detailed modeling and evaluation of learning progress.
translated by 谷歌翻译
本文旨在探讨如何合成对其进行训练的现有视频脱毛模型的近距离模糊,可以很好地推广到现实世界中的模糊视频。近年来,基于深度学习的方法已在视频Deblurring任务上取得了希望的成功。但是,对现有合成数据集培训的模型仍然遭受了与现实世界中的模糊场景的概括问题。造成故障的因素仍然未知。因此,我们重新审视经典的模糊综合管道,并找出可能的原因,包括拍摄参数,模糊形成空间和图像信号处理器〜(ISP)。为了分析这些潜在因素的效果,我们首先收集一个超高帧速率(940 fps)原始视频数据集作为数据基础,以综合各种模糊。然后,我们提出了一种新颖的现实模糊合成管道,该管道通过利用模糊形成线索称为原始爆炸。通过大量实验,我们证明了在原始空间中的合成模糊并采用与现实世界测试数据相同的ISP可以有效消除合成数据的负面影响。此外,合成的模糊视频的拍摄参数,例如,曝光时间和框架速率在改善脱毛模型的性能中起着重要作用。令人印象深刻的是,与在现有合成模糊数据集中训练的训练的模型合成的模糊数据训练的模型可以获得超过5DB PSNR的增益。我们认为,新颖的现实合成管道和相应的原始视频数据集可以帮助社区轻松构建自定义的Blur数据集,以改善现实世界的视频DeBlurring性能,而不是费力地收集真实的数据对。
translated by 谷歌翻译
图像检索已成为一种越来越有吸引力的技术,具有广泛的多媒体应用前景,在该技术中,深层哈希是朝着低存储和有效检索的主要分支。在本文中,我们对深度学习中的度量学习进行了深入的研究,以在多标签场景中建立强大的度量空间,在多标签场景中,两人的损失遭受了高度计算的开销和汇聚难度,而代理损失理论上是无法表达的。深刻的标签依赖性和在构造的超球场空间中表现出冲突。为了解决这些问题,我们提出了一个新颖的度量学习框架,该框架具有混合代理损失(hyt $^2 $损失),该框架构建了具有高效训练复杂性W.R.T.的表现力度量空间。整个数据集。拟议的催眠$^2 $损失着重于通过可学习的代理和发掘无关的数据与数据相关性来优化超晶体空间,这整合了基于成对方法的足够数据对应关系以及基于代理方法的高效效率。在四个标准的多标签基准上进行的广泛实验证明,所提出的方法优于最先进的方法,在不同的哈希片中具有强大的功能,并且以更快,更稳定的收敛速度实现了显着的性能增长。我们的代码可从https://github.com/jerryxu0129/hyp2-loss获得。
translated by 谷歌翻译
最近,学习的视频压缩引起了很多关注,并显示出令人鼓舞的结果的快速发展趋势。但是,先前的作品仍然存在一些批评问题,并且在广泛使用的PSNR度量方面,具有传统压缩标准的性​​能差距。在本文中,我们提出了几种技术来有效提高性能。首先,为了解决累积错误的问题,我们将有条件的I框架作为GOP中的第一帧,该框架稳定了重建的质量并节省了比特率。其次,为了有效地提高相互预测的准确性而不增加解码器的复杂性,我们提出了一种像素到功能的运动预测方法,可以帮助我们获得高质量的运动信息。第三,我们提出了一种基于概率的熵跳过方法,该方法不仅带来了性能增长,而且大大降低了熵编码的运行时。借助这些强大的技术,本文提出了Alphavc,这是一种高性能且高效的学习视频压缩方案。据我们所知,Alphavc是第一个E2E AI编解码器,它超过了PSNR的所有常见测试数据集上最新的压缩标准VVC(-28.2%BD率节省)和MSSSSIM(-52.2%BD-rate节省),并且具有非常快速的编码(0.001x VVC)和解码(1.69x VVC)速度。
translated by 谷歌翻译
视频文本预训练(VTP)旨在从大规模的网络视频中学习可转移的代表。迄今为止,几乎所有现有的VTP方法都仅限于基于检索的下游任务,例如视频检索,而它们在基于本地化的任务(例如时间基础)上的转移潜力不足。在本文中,我们实验分析并证明了当前VTP方法与本地化任务的不相容性,并提出了一种新颖的面向定位的视频文本预训练框架,称为LocvTP。具体而言,我们执行细粒对比度对准作为通过剪贴字对数发现方案对粗粒粒度的补充。为了进一步增强学习功能的时间推理能力,我们提出了一个上下文投影头和暂时意识的对比损失,以感知上下文关系。对六个数据集的四个下游任务进行的广泛实验表明,我们的LOCVTP在基于检索和基于本地化的任务上都达到了最先进的性能。此外,我们进行了全面的消融研究和彻底的分析,以探索最佳的模型设计和培训策略。
translated by 谷歌翻译
快速对抗训练(脂肪)有效地提高了标准对抗训练(SAT)的效率。然而,初始脂肪遇到灾难性的过度拟合,即,对抗性攻击的稳健精度突然并大大减少。尽管有几种脂肪变体毫不费力地防止过度拟合,但他们牺牲了很多计算成本。在本文中,我们探讨了SAT和FAT的训练过程之间的差异,并观察到,对抗性实例(AES)脂肪的攻击成功率在后期训练阶段逐渐变得更糟,从而导致过度拟合。 AE是通过零或随机初始化的快速梯度标志方法(FGSM)生成的。根据观察结果,我们提出了一种先前的FGSM初始化方法,以避免在研究多种初始化策略后避免过度适应,从而在整个训练过程中提高了AE的质量。初始化是通过利用历史上生成的AE而没有额外计算成本而形成的。我们进一步为提出的初始化方法提供了理论分析。我们还基于先前的初始化,即当前生成的扰动不应过多地偏离先前引导的初始化,因此我们还提出了一个简单而有效的正规化程序。正常化器同时采用历史和当前的对抗性扰动来指导模型学习。在四个数据集上进行的评估表明,所提出的方法可以防止灾难性过度拟合和优于最先进的脂肪方法。该代码在https://github.com/jiaxiaojunqaq/fgsm-pgi上发布。
translated by 谷歌翻译